{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "realistic-nerve",
   "metadata": {},
   "source": [
    "# Dataprep.Clean GUI"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "upper-classroom",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bacterial-visit",
   "metadata": {},
   "source": [
    "The function `clean_df_gui()` shows a web service to help users clean dataframe without any coding."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "strategic-actress",
   "metadata": {},
   "source": [
    "### An example dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "serious-batch",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div><div id=dd01f9ba-4af2-407a-bb81-cad86b9739a3 style=\"display:none; background-color:#9D6CFF; color:white; width:200px; height:30px; padding-left:5px; border-radius:4px; flex-direction:row; justify-content:space-around; align-items:center;\" onmouseover=\"this.style.backgroundColor='#BA9BF8'\" onmouseout=\"this.style.backgroundColor='#9D6CFF'\" onclick=\"window.commands?.execute('create-mitosheet-from-dataframe-output');\">See Full Dataframe in Mito</div> <script> if (window.commands.hasCommand('create-mitosheet-from-dataframe-output')) document.getElementById('dd01f9ba-4af2-407a-bb81-cad86b9739a3').style.display = 'flex' </script> <table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Name</th>\n",
       "      <th>AGE</th>\n",
       "      <th>weight__</th>\n",
       "      <th>Admission Date</th>\n",
       "      <th>email_address</th>\n",
       "      <th>Country of Birth</th>\n",
       "      <th>Contact (Numbers)</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>Abby</td>\n",
       "      <td>12</td>\n",
       "      <td>32.5</td>\n",
       "      <td>2020-01-01</td>\n",
       "      <td>abby@gmail.com</td>\n",
       "      <td>CA</td>\n",
       "      <td>1-789-456-0123</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>Scott</td>\n",
       "      <td>33</td>\n",
       "      <td>47.1</td>\n",
       "      <td>2020-01-15</td>\n",
       "      <td>scott@gmail.com</td>\n",
       "      <td>Canada</td>\n",
       "      <td>1-123-456-7890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>Scott</td>\n",
       "      <td>33</td>\n",
       "      <td>47.1</td>\n",
       "      <td>2020-01-15</td>\n",
       "      <td>scott@gmail.com</td>\n",
       "      <td>Canada</td>\n",
       "      <td>1-123-456-7890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>Scott2</td>\n",
       "      <td>56</td>\n",
       "      <td>55.2</td>\n",
       "      <td>2020-09-01</td>\n",
       "      <td>test@abc.com</td>\n",
       "      <td>NULL</td>\n",
       "      <td>1-456-123-7890</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "      <td>NULL</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table></div>"
      ],
      "text/plain": [
       "     Name   AGE weight__ Admission Date    email_address Country of Birth  \\\n",
       "0    Abby    12     32.5     2020-01-01   abby@gmail.com               CA   \n",
       "1   Scott    33     47.1     2020-01-15  scott@gmail.com           Canada   \n",
       "2   Scott    33     47.1     2020-01-15  scott@gmail.com           Canada   \n",
       "3  Scott2    56     55.2     2020-09-01     test@abc.com             NULL   \n",
       "4     NaN   NaN      NaN            NaT              NaN              NaN   \n",
       "5    NULL  NULL     NULL           NULL             NULL             NULL   \n",
       "\n",
       "  Contact (Numbers)  \n",
       "0    1-789-456-0123  \n",
       "1    1-123-456-7890  \n",
       "2    1-123-456-7890  \n",
       "3    1-456-123-7890  \n",
       "4               NaN  \n",
       "5              NULL  "
      ]
     },
     "execution_count": 1,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "df = pd.DataFrame({\"Name\": [\"Abby\", \"Scott\", \"Scott\", \"Scott2\", np.nan, \"NULL\"],\n",
    "                   \"AGE\": [12, 33, 33, 56,  np.nan, \"NULL\"],\n",
    "                   \"weight__\": [32.5, 47.1, 47.1, 55.2, np.nan, \"NULL\"],\n",
    "                   \"Admission Date\": [\"2020-01-01\", \"2020-01-15\", \"2020-01-15\", \"2020-09-01\", pd.NaT, \"NULL\"],\n",
    "                   \"email_address\": [\"abby@gmail.com\",\"scott@gmail.com\", \"scott@gmail.com\", \"test@abc.com\", np.nan, \"NULL\"],\n",
    "                   \"Country of Birth\": [\"CA\",\"Canada\", \"Canada\", \"NULL\", np.nan, \"NULL\"],\n",
    "                   \"Contact (Numbers)\": [\"1-789-456-0123\",\"1-123-456-7890\",\"1-123-456-7890\",\"1-456-123-7890\", np.nan, \"NULL\" ],\n",
    "\n",
    "})\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "active-bracket",
   "metadata": {},
   "source": [
    "## Default `clean_df_gui`\n",
    "\n",
    "By default, `clean_df_gui` will open a web service on port 7680 for users to clean their dataframe."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "certified-concern",
   "metadata": {},
   "outputs": [],
   "source": [
    "from dataprep.clean import clean_df_gui\n",
    "clean_df_gui(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3084698",
   "metadata": {},
   "source": [
    "![init_view](./assets/init_view.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "55d5903d",
   "metadata": {},
   "source": [
    "## Clean Whole DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a658562b",
   "metadata": {},
   "source": [
    "Click \"Whole DF\" and select parameters. Then click \"OK\" to clean the whole DataFrame. \n",
    "\n",
    "After cleaning, the result is shown as following:\n",
    "![whole_df](./assets/whole_df.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dadebea9",
   "metadata": {},
   "source": [
    "## Clean One Column with Clean Funtions"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89d843eb",
   "metadata": {},
   "source": [
    "Select proper clean functions for single column. Then select the column need to be cleaned and select parameters for the clean function. Then click \"OK\" to clean the selected column. \n",
    "\n",
    "We take `Phone Number` as example. After cleaning, the result is shown as following:\n",
    "![single_col](./assets/single_col.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b77f2fcf",
   "metadata": {},
   "source": [
    "## Export Execution Log\n",
    "\n",
    "Users can click the button of exporting log:\n",
    "![click_log](./assets/click_log.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f45be98b",
   "metadata": {},
   "source": [
    "Then we get the exported log shown in Jupyter Notebook:\n",
    "![exported_log](./assets/exported_log.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d3de05c1",
   "metadata": {},
   "source": [
    "## Export DataFrame"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6c9e0faa",
   "metadata": {},
   "source": [
    "Users can click the button of exporting DataFrame:\n",
    "![click_df](./assets/click_df.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aa3053e6",
   "metadata": {},
   "source": [
    "Then we get the exported DataFrame shown in Jupyter Notebook:\n",
    "![exported_df](./assets/exported_df.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e49b3499",
   "metadata": {},
   "source": [
    "## Export CSV"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8a36cbea",
   "metadata": {},
   "source": [
    "Users can click the button of exporting CSV to their local directories:\n",
    "![click_csv](./assets/click_csv.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57fb7941",
   "metadata": {},
   "source": [
    "## Back to Origin Data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a373203e",
   "metadata": {},
   "source": [
    "Users can click the `Origin` button to get the origin data and clean up all operations:\n",
    "![click_origin](./assets/click_origin.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b5179aa",
   "metadata": {},
   "source": [
    "After clicking `Origin`, the origin data can be exhibited and all execution logs are cleaned up:\n",
    "![after_origin](./assets/after_origin.png)\n",
    "![log_after_origin](./assets/log_after_origin.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2d99a0d3",
   "metadata": {},
   "source": [
    "## Undo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "45d86475",
   "metadata": {},
   "source": [
    "Users can click the `Undo` button to get one step back:\n",
    "![click_undo](./assets/click_undo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6933fbd0",
   "metadata": {},
   "source": [
    "After clicking `Undo`, the clean steps can get one step back:\n",
    "![after_undo](./assets/after_undo.png)\n",
    "![log_after_origin](./assets/log_after_origin.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a5998f3b",
   "metadata": {},
   "source": [
    "## Redo"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ba03d856",
   "metadata": {},
   "source": [
    "Users can click the `Redo` button to get one step forward:\n",
    "![click_redo](./assets/click_redo.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b35b63d8",
   "metadata": {},
   "source": [
    "After clicking `Redo`, the clean steps can get one step forward:\n",
    "![after_redo](./assets/after_redo.png)\n",
    "![log_after_redo](./assets/log_after_redo.png)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "24547ab8",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
